Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
نویسنده
چکیده
Given a Markov Decision Process (MDP) with n states and m actions perstate, we study the number of iterations needed by Policy Iteration (PI)algorithms to converge to the optimal γ-discounted optimal policy. We con-sider two variations of PI: Howard’s PI that changes the actions in all stateswith a positive advantage, and Simplex-PI that only changes the action inthe state with maximal advantage. We show that Howard’s PI terminatesafter at most n(m − 1)⌈
منابع مشابه
UPPER BOUNDS FOR FINITENESS OF GENERALIZED LOCAL COHOMOLOGY MODULES
Let $R$ be a commutative Noetherian ring with non-zero identity and $fa$ an ideal of $R$. Let $M$ be a finite $R$--module of finite projective dimension and $N$ an arbitrary finite $R$--module. We characterize the membership of the generalized local cohomology modules $lc^{i}_{fa}(M,N)$ in certain Serre subcategories of the category of modules from upper bounds. We define and study the properti...
متن کاملImproved Strong Worst-case Upper Bounds for MDP Planning
The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP planning, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved strong worstcase upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of ...
متن کاملAn improved infeasible interior-point method for symmetric cone linear complementarity problem
We present an improved version of a full Nesterov-Todd step infeasible interior-point method for linear complementarityproblem over symmetric cone (Bull. Iranian Math. Soc., 40(3), 541-564, (2014)). In the earlier version, each iteration consisted of one so-called feasibility step and a few -at most three - centering steps. Here, each iteration consists of only a feasibility step. Thus, the new...
متن کامل[hal-00829532, v3] Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal γ-discounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal advan...
متن کاملOn the Complexity of Policy Iteration
Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MD Ps). Pol icy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Math. Oper. Res.
دوره 41 شماره
صفحات -
تاریخ انتشار 2013